Bellabeat, a technology company that produces smart health products, aims to examine the usage patterns of one of their products to gain a better understanding of how people are utilizing their smart devices. Based on these findings, the company desires strategic recommendations for how these trends can influence their marketing approach.
○ Urška Sršen: Bellabeat’s cofounder and Chief Creative Officer.
○ Sando Mur: Mathematician and Bellabeat’s cofounder; key member of the Bellabeat executive team.
○ Bellabeat marketing analytics team: A team of data analysts responsible for collecting, analyzing, and reporting data that helps guide Bellabeat’s marketing strategy.
● FitBit Fitness Tracker Data (CC0: Public Domain, dataset made available through Mobius): This Kaggle data set contains personal fitness tracker from thirty fitbit users. Thirty eligible Fitbit users consented to the submission of personal tracker data, including minute-level output for physical activity, heart rate, and sleep monitoring. It includes information about daily activity, steps, and heart rate that can be used to explore users’ habits.
The data is stored in various tables depending on the frequency of the observation and the type of observations. For this analysis we will be using the data gathered daily 1.dailyActivity_merged.csv 2.dailyCalories_merged.csv. 3.dailyIntensities_merged.csv 4.dailySteps_merged.csv 5.sleepDay_merged.csv 6.weightLoginfo_merged.csv
install.packages("tidyverse")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.2'
## (as 'lib' is unspecified)
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## âś” dplyr 1.1.1 âś” readr 2.1.4
## âś” forcats 1.0.0 âś” stringr 1.5.0
## âś” ggplot2 3.4.1 âś” tibble 3.2.1
## âś” lubridate 1.9.2 âś” tidyr 1.3.0
## âś” purrr 1.0.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## âś– dplyr::filter() masks stats::filter()
## âś– dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(lubridate)
activity <- read_csv("Fitabase Data 4.12.16-5.12.16/dailyActivity_merged.csv")
## Rows: 940 Columns: 15
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): ActivityDate
## dbl (14): Id, TotalSteps, TotalDistance, TrackerDistance, LoggedActivitiesDi...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
calories <- read_csv("Fitabase Data 4.12.16-5.12.16/dailyCalories_merged.csv")
## Rows: 940 Columns: 3
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): ActivityDay
## dbl (2): Id, Calories
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
intensities <- read_csv("Fitabase Data 4.12.16-5.12.16/dailyIntensities_merged.csv")
## Rows: 940 Columns: 10
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): ActivityDay
## dbl (9): Id, SedentaryMinutes, LightlyActiveMinutes, FairlyActiveMinutes, Ve...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
steps <- read_csv("Fitabase Data 4.12.16-5.12.16/dailySteps_merged.csv")
## Rows: 940 Columns: 3
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): ActivityDay
## dbl (2): Id, StepTotal
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
sleepday <- read_csv("Fitabase Data 4.12.16-5.12.16/sleepDay_merged.csv")
## Rows: 413 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): SleepDay
## dbl (4): Id, TotalSleepRecords, TotalMinutesAsleep, TotalTimeInBed
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
weight <- read_csv("Fitabase Data 4.12.16-5.12.16/weightLogInfo_merged.csv")
## Rows: 67 Columns: 8
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): Date
## dbl (6): Id, WeightKg, WeightPounds, Fat, BMI, LogId
## lgl (1): IsManualReport
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
before merging we have to change the common column names in various tables to a standard name.
activity <- rename( activity, date = ActivityDate)
calories <- rename( calories, date = ActivityDay)
intensities <- rename( intensities, date = ActivityDay)
sleepday <- rename( sleepday, date = SleepDay)
steps <- rename( steps, date = ActivityDay)
weight <- rename( weight, date = Date)
activity <- mutate( activity, date = mdy(date))
sleepday <- mutate( sleepday, date = mdy_hms(date))
weight <- mutate( weight, date = mdy_hms(date)) %>% mutate(date=as_date(date))
Adding a columns for day of the week
activity <- activity %>% mutate( Weekday = weekdays(date))
we will be merging all the tables together so that all the important information is in a single table for ease of use and analysis. As we have the information of calories,steps and intensities tables in activity table, we will merge the other tables
sleep_merged <- merge(activity,sleepday,by=c("Id","date"))
weight_merged <- merge(sleep_merged,weight, by = c("Id", "date"))
now we have 3 tables i.e activity, sleep_merged, weight_merged. We will select the relevant columns from the weight_merged table.
summary(activity)
## Id date TotalSteps TotalDistance
## Min. :1.504e+09 Min. :2016-04-12 Min. : 0 Min. : 0.000
## 1st Qu.:2.320e+09 1st Qu.:2016-04-19 1st Qu.: 3790 1st Qu.: 2.620
## Median :4.445e+09 Median :2016-04-26 Median : 7406 Median : 5.245
## Mean :4.855e+09 Mean :2016-04-26 Mean : 7638 Mean : 5.490
## 3rd Qu.:6.962e+09 3rd Qu.:2016-05-04 3rd Qu.:10727 3rd Qu.: 7.713
## Max. :8.878e+09 Max. :2016-05-12 Max. :36019 Max. :28.030
## TrackerDistance LoggedActivitiesDistance VeryActiveDistance
## Min. : 0.000 Min. :0.0000 Min. : 0.000
## 1st Qu.: 2.620 1st Qu.:0.0000 1st Qu.: 0.000
## Median : 5.245 Median :0.0000 Median : 0.210
## Mean : 5.475 Mean :0.1082 Mean : 1.503
## 3rd Qu.: 7.710 3rd Qu.:0.0000 3rd Qu.: 2.053
## Max. :28.030 Max. :4.9421 Max. :21.920
## ModeratelyActiveDistance LightActiveDistance SedentaryActiveDistance
## Min. :0.0000 Min. : 0.000 Min. :0.000000
## 1st Qu.:0.0000 1st Qu.: 1.945 1st Qu.:0.000000
## Median :0.2400 Median : 3.365 Median :0.000000
## Mean :0.5675 Mean : 3.341 Mean :0.001606
## 3rd Qu.:0.8000 3rd Qu.: 4.782 3rd Qu.:0.000000
## Max. :6.4800 Max. :10.710 Max. :0.110000
## VeryActiveMinutes FairlyActiveMinutes LightlyActiveMinutes SedentaryMinutes
## Min. : 0.00 Min. : 0.00 Min. : 0.0 Min. : 0.0
## 1st Qu.: 0.00 1st Qu.: 0.00 1st Qu.:127.0 1st Qu.: 729.8
## Median : 4.00 Median : 6.00 Median :199.0 Median :1057.5
## Mean : 21.16 Mean : 13.56 Mean :192.8 Mean : 991.2
## 3rd Qu.: 32.00 3rd Qu.: 19.00 3rd Qu.:264.0 3rd Qu.:1229.5
## Max. :210.00 Max. :143.00 Max. :518.0 Max. :1440.0
## Calories Weekday
## Min. : 0 Length:940
## 1st Qu.:1828 Class :character
## Median :2134 Mode :character
## Mean :2304
## 3rd Qu.:2793
## Max. :4900
summary(sleep_merged)
## Id date TotalSteps TotalDistance
## Min. :1.504e+09 Min. :2016-04-12 Min. : 17 Min. : 0.010
## 1st Qu.:3.977e+09 1st Qu.:2016-04-19 1st Qu.: 5206 1st Qu.: 3.600
## Median :4.703e+09 Median :2016-04-27 Median : 8925 Median : 6.290
## Mean :5.001e+09 Mean :2016-04-26 Mean : 8541 Mean : 6.039
## 3rd Qu.:6.962e+09 3rd Qu.:2016-05-04 3rd Qu.:11393 3rd Qu.: 8.030
## Max. :8.792e+09 Max. :2016-05-12 Max. :22770 Max. :17.540
## TrackerDistance LoggedActivitiesDistance VeryActiveDistance
## Min. : 0.010 Min. :0.0000 Min. : 0.00
## 1st Qu.: 3.600 1st Qu.:0.0000 1st Qu.: 0.00
## Median : 6.290 Median :0.0000 Median : 0.57
## Mean : 6.034 Mean :0.1131 Mean : 1.45
## 3rd Qu.: 8.020 3rd Qu.:0.0000 3rd Qu.: 2.37
## Max. :17.540 Max. :4.0817 Max. :12.54
## ModeratelyActiveDistance LightActiveDistance SedentaryActiveDistance
## Min. :0.0000 Min. :0.010 Min. :0.0000000
## 1st Qu.:0.0000 1st Qu.:2.540 1st Qu.:0.0000000
## Median :0.4200 Median :3.680 Median :0.0000000
## Mean :0.7502 Mean :3.807 Mean :0.0009201
## 3rd Qu.:1.0400 3rd Qu.:4.930 3rd Qu.:0.0000000
## Max. :6.4800 Max. :9.480 Max. :0.1100000
## VeryActiveMinutes FairlyActiveMinutes LightlyActiveMinutes SedentaryMinutes
## Min. : 0.00 Min. : 0.00 Min. : 2.0 Min. : 0.0
## 1st Qu.: 0.00 1st Qu.: 0.00 1st Qu.:158.0 1st Qu.: 631.0
## Median : 9.00 Median : 11.00 Median :208.0 Median : 717.0
## Mean : 25.19 Mean : 18.04 Mean :216.9 Mean : 712.2
## 3rd Qu.: 38.00 3rd Qu.: 27.00 3rd Qu.:263.0 3rd Qu.: 783.0
## Max. :210.00 Max. :143.00 Max. :518.0 Max. :1265.0
## Calories Weekday TotalSleepRecords TotalMinutesAsleep
## Min. : 257 Length:413 Min. :1.000 Min. : 58.0
## 1st Qu.:1850 Class :character 1st Qu.:1.000 1st Qu.:361.0
## Median :2220 Mode :character Median :1.000 Median :433.0
## Mean :2398 Mean :1.119 Mean :419.5
## 3rd Qu.:2926 3rd Qu.:1.000 3rd Qu.:490.0
## Max. :4900 Max. :3.000 Max. :796.0
## TotalTimeInBed
## Min. : 61.0
## 1st Qu.:403.0
## Median :463.0
## Mean :458.6
## 3rd Qu.:526.0
## Max. :961.0
weight_merged <- weight_merged %>% select(Id,date,TotalSteps,TotalDistance,TrackerDistance , LoggedActivitiesDistance, VeryActiveDistance,ModeratelyActiveDistance, LightActiveDistance, SedentaryActiveDistance, VeryActiveMinutes, FairlyActiveMinutes, LightlyActiveMinutes, SedentaryMinutes, Calories, TotalSleepRecords, TotalMinutesAsleep, TotalTimeInBed, WeightKg, BMI,Weekday)
summary(weight_merged)
## Id date TotalSteps TotalDistance
## Min. :1.504e+09 Min. :2016-04-12 Min. : 356 Min. : 0.250
## 1st Qu.:6.962e+09 1st Qu.:2016-04-18 1st Qu.: 5780 1st Qu.: 3.825
## Median :6.962e+09 Median :2016-04-28 Median :10524 Median : 6.960
## Mean :6.398e+09 Mean :2016-04-26 Mean : 9687 Mean : 6.523
## 3rd Qu.:6.962e+09 3rd Qu.:2016-05-03 3rd Qu.:12484 3rd Qu.: 8.730
## Max. :6.962e+09 Max. :2016-05-12 Max. :20031 Max. :13.240
## TrackerDistance LoggedActivitiesDistance VeryActiveDistance
## Min. : 0.250 Min. :0.0000 Min. :0.000
## 1st Qu.: 3.825 1st Qu.:0.0000 1st Qu.:0.000
## Median : 6.960 Median :0.0000 Median :1.200
## Mean : 6.464 Mean :0.2867 Mean :1.727
## 3rd Qu.: 8.610 3rd Qu.:0.0000 3rd Qu.:3.305
## Max. :13.240 Max. :4.0817 Max. :5.980
## ModeratelyActiveDistance LightActiveDistance SedentaryActiveDistance
## Min. :0.0000 Min. :0.25 Min. :0.000
## 1st Qu.:0.1900 1st Qu.:2.76 1st Qu.:0.000
## Median :0.7600 Median :3.91 Median :0.000
## Mean :0.9083 Mean :3.88 Mean :0.006
## 3rd Qu.:1.6800 3rd Qu.:4.88 3rd Qu.:0.000
## Max. :2.3900 Max. :7.04 Max. :0.110
## VeryActiveMinutes FairlyActiveMinutes LightlyActiveMinutes SedentaryMinutes
## Min. : 0.00 Min. : 0.00 Min. : 32.0 Min. : 127.0
## 1st Qu.: 0.00 1st Qu.: 3.50 1st Qu.:197.0 1st Qu.: 635.5
## Median : 18.00 Median :15.00 Median :240.0 Median : 689.0
## Mean : 27.49 Mean :18.37 Mean :236.5 Mean : 688.5
## 3rd Qu.: 42.00 3rd Qu.:33.50 3rd Qu.:286.0 3rd Qu.: 736.0
## Max. :200.00 Max. :42.00 Max. :369.0 Max. :1121.0
## Calories TotalSleepRecords TotalMinutesAsleep TotalTimeInBed
## Min. : 928 Min. :1.000 Min. :115.0 Min. :129.0
## 1st Qu.:1852 1st Qu.:1.000 1st Qu.:399.0 1st Qu.:420.0
## Median :2039 Median :1.000 Median :442.0 Median :455.0
## Mean :2052 Mean :1.086 Mean :430.3 Mean :449.8
## 3rd Qu.:2168 3rd Qu.:1.000 3rd Qu.:472.5 3rd Qu.:494.0
## Max. :4552 Max. :3.000 Max. :630.0 Max. :679.0
## WeightKg BMI Weekday
## Min. : 52.60 Min. :22.65 Length:35
## 1st Qu.: 61.20 1st Qu.:23.89 Class :character
## Median : 61.50 Median :24.00 Mode :character
## Mean : 64.17 Mean :24.83
## 3rd Qu.: 61.90 3rd Qu.:24.17
## Max. :133.50 Max. :47.54
First, we will analyze the activity table.
ggplot(activity, aes(x = TotalSteps , y = Calories)) + geom_point()+geom_smooth()
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
ggplot(activity, aes(x = TotalSteps , y = TotalDistance)) + geom_point()+geom_smooth()
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
From the above two graphs, we can observe that there is a strong positive correlation between TotalSteps and TotalDistance, which is expected since walking is a major contributor to both these variables. However, we cannot see the same correlation between TotalSteps and TotalCalories, as there can be other physical activities that burn calories but do not involve walking, such as weightlifting or yoga. Therefore, we can infer that relying solely on step counts may not be the most accurate measure of total physical activity, and it may be necessary to consider additional factors such as exercise intensity or heart rate to get a more complete picture of energy expenditure.
total <- (sum(activity$SedentaryMinutes)+sum(activity$LightlyActiveMinutes)+sum(activity$FairlyActiveMinutes) + sum(activity$VeryActiveMinutes))/100
sedentary_percentage <- sum(activity$SedentaryMinutes)/total
lightlyActive_percentage <- sum(activity$LightlyActiveMinutes)/total
fairlyActive_percentage <- sum(activity$FairlyActiveMinutes)/total
veryActive_percentage <- sum(activity$VeryActiveMinutes)/total
percentage <- data.frame(
level=c("Sedentary", "Lightly Active", "Fairly Active", "Very Active"),
percentage=c( sedentary_percentage,lightlyActive_percentage,fairlyActive_percentage,veryActive_percentage)
)
pie(percentage$percentage,labels= paste(percentage$level,"-",round(percentage$percentage,1),"%"),col = rainbow(length(percentage$percentage)),main="Percentage of Various Activity Levels")
From the pie chart, we can see that users as a whole spent 81.3% of their activity in sedentary state and only 1.74% in very active state.
weekdays <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday")
activity$Weekday <- factor(activity$Weekday, levels = weekdays)
ggplot(data = activity, aes(x = Weekday, y = TotalSteps, fill = Weekday)) +
geom_bar(stat = "identity") +
ylab("Total Steps")
Weekdays have higher total steps than weekends: The chart shows that the total steps are generally higher on weekdays (Monday to Friday) than on weekends (Saturday and Sunday). This could suggest that users tend to be more active during the week, possibly due to work-related activities or other weekday routines.
Total steps are lowest on weekends: The chart shows that the total steps are lowest on weekends, particularly on Sundays. This could suggest that users tend to be less active on weekends, possibly due to leisure activities or rest days.
now, we will analyze sleep data
ggplot(data= sleep_merged, aes(x = TotalMinutesAsleep, y = Calories)) +
geom_col(size = 3) + geom_smooth(col="red")+
theme_classic()
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
This chart shows the relation between the activity level and sleep quality , with some activity (even very little activity) we see a great increase of normal sleepers(7 to 8 hrs).
The important insight here is the decrease of over sleepers (more than 8h) in the most calorie burnt category.
ggplot(weight_merged, aes(x = TotalSteps, y = WeightKg)) +
geom_point(alpha = 0.5) + geom_smooth(col="pink") +
labs(x = "Activity Level (Total Steps)", y = "Weight (kg)") +
ggtitle("Relationship between activity level and weight for Bellabeat users")
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
There is a positive correlation between physical activity and weight loss.
1.Women tend to be highly active during the day, but have a high proportion of sedentary time as well. This suggests that women are fitting in physical activity around their busy schedules, but may be spending too much time sitting.
This insight provides an opportunity for Bellabeat to develop new features that encourage women to take more breaks and engage in light activity throughout the day, such as reminders to stand up and stretch or to take a short walk. By addressing this specific need of their target market, Bellabeat can differentiate themselves from competitors and provide value to their customers.
Furthermore, this insight highlights the importance of understanding the unique behaviors and needs of a specific target market. By tailoring product features and marketing strategies to the specific needs of their customers, companies can create more value and build stronger relationships with their customers.
2.Women who had longer and more consistent sleep patterns tended to be more active during the day. This suggests that there is a strong connection between sleep quality and physical activity, and that addressing one area can lead to improvements in the other.
This insight provides an opportunity for Bellabeat to develop new features that help customers improve their sleep quality and consistency, such as sleep tracking and analysis, personalized sleep recommendations, and guided meditations or relaxation exercises. By focusing on both sleep and physical activity, Bellabeat can provide a more holistic approach to wellness and differentiate themselves from competitors.
Additionally, this insight highlights the importance of taking a comprehensive approach to understanding customer behavior and needs. By analyzing multiple aspects of customers’ lives, such as activity levels, sleep patterns, stress levels, and menstrual cycles, Bellabeat was able to gain a more complete understanding of their customers and identify opportunities for improving their product offerings.
Image by redgreystock on Freepik